# **Pipelined MIPS**

### The pipelined MIPS



0,4,8,C,10

Data (variables) reside



#### **MIPS** instructions

#### R-type instructions: add, sub, slt, and, or, xor (& jr)

sub rd, rs, rt

$$\# rd = rs - rt$$

means subtract the contents of rt from the contents of rs and store the result into rd

The R-type instructions we use are:

add rd, rs, rt

sub rd, rs, rt

and rd, rs, rt

or rd, rs, rt

xor rd, rs, rt

slt rd, rs, rt

# rd = rs + rt

# rd = rs - rt

# rd = rs AND rt

# rd = rs OR rt

# rd = rs XOR rt

# rd =1 if rs < rt, 0 otherwise [=set less than]

The last R-type instruction is:

jr rs

#PC = rs

The R-type instruction binary representation is:

| 6      | 5  | 5  | 5  |       | 6        |
|--------|----|----|----|-------|----------|
| 000000 | Rs | Rt | Rd | 00000 | FUNCTION |

#### **MIPS** instructions

#### I-type instructions: addi, ori, beq, bne , lui, lw, sw

addi rt, rs, imm #rt = rs + sext(imm) [imm is 16 bit 2's comp]
means add imm to the contents of rs and put the result in rt

ori rt, rs, imm #rt = rs OR imm [imm is 16 bits -no sext]
means OR imm with the contents of rs and put the result in rt
lui rt, imm #rt = imm << 16 [imm is 16 bits -no sext]
means shift imm left by 16 and put the result in rt (16 LSBs will be '0'-s)

beq rt, rs, imm # if rs == rt PC = PC + 4 + 4\*sext(imm) [imm is 16 bit 2's comp]

# else PC = PC + 4

means if the contents of **rs** equals the contents of **rt**, jump imm+1 instructions forward **bne** - same, but checks if not equal

lw rt, (imm) rs # rt = M[rs + sext( imm ) ] [imm is 16 bit 2's comp]

Calc. address by adding imm to the contents of **rs**, read from the D.Mem and copy into **rt** 

sw rt, (imm) rs # M[rs + sext( imm )]= rt [imm is 16 bit 2's comp]

Calc. address by adding imm to the contents of **rs**, write **rt** into that address in the D.Mem

The I-type instruction binary representation is:

| 6      | 5  | 5  | 16  |
|--------|----|----|-----|
| OPCODE | Rs | Rt | imm |

#### **MIPS** instructions

#### J-type instructions: j, jal

j imm26 # PC = PC[31:28] | 4\*imm26 [imm26 = 26 bits] means jump to the word specified by the 26 bit imm in the instruction

jal imm26 # PC = PC[31:28]|| 4\*imm26 [imm26 = 26 bits] # \$31= PC+4

means jump to the word specified by the 26 bit imm in the instruction, and also keep the address of the next instructions in register \$31

The J-type instruction binary representation is:

| 6      | 26         |  |  |  |
|--------|------------|--|--|--|
| OPCODE | 26 bit imm |  |  |  |

## Rtype instructions





IF – Inst. Fetch

IR = Imem[PC] PC = PC+4



ID - Inst. Decode

A = rs, B = rt

(& decode control signals)



**EX- Execute** 

**ALUOUT = A op B** 



**MEM – Memory** 

In Rtype – wait 1 ck



WB – write back

Rd = ALUOUT

## Iw instruction





IF – Inst. Fetch

IR = Imem[PC] PC = PC+4



ID - Inst. Decode

$$A = rs$$
,  $B = rt$ 

(& decode control signals)



**EX- Execute** 

ALUOUT = A+sext(imm)



**MEM – Memory** 

In lw:

MDR = M[ALUOUT]



WB – write back

Rt = MDR

### sw instruction





IF – Inst. Fetch

IR = Imem[PC] PC = PC+4



ID - Inst. Decode

A = rs, B = rt

(& decode control signals)



**EX- Execute** 

ALUOUT = A+sext(imm)



**MEM – Memory** 

In sw:

M[ALUOUT] = B dlyd



WB – write back

#### Do nothing

## j or beq instructions





IF – Inst. Fetch

IR = Imem[PC] PC = PC+4



ID – Inst. DecodeIn jump:

PC = jump adrs



ID – Inst. DecodeIn jump:

PC = branch adrs if rs==rt

## Pipelined operation





IF - Inst. Fetch

IR = Imem[PC]

PC = PC+4

PC = PC+4



control signals)





















### **Pipelining timing representation**











If we rotate that, we see the same picture we saw in the previous slides. 0  $\mathbb{Z}$ ð 図

If we rotate that, we ਜ see the same 0 picture we saw in the previous slides. 0 <del>-</del>---m U 図











### Pipelining timing representation



From now and on, we will use this scheme to describe what happens in the CPU.

# HW2 – the Fetch Unit

## **The Fetch Unit**



= x"00400004"

#### Names & definition of signals inside the Fetch Unit:

#### You must use these exact signal names in your design.

- **PC\_reg** a 32 bit register. **Reset** should force PC\_reg= 0x400000
- PC\_plus\_4 a 32 bit signal that has the PC\_reg value + 4.
- **PC\_plus\_4\_pID** a registered version of the PC\_plus\_4 to be used in the ID phase. This is why we added \_pID at the end of that signal name.
- branch\_adrs a 32 bit signal which is made of PC\_plus\_4\_pID + sext(imm)<<2.</li>
   This is the address to be loaded into the PC when a successful branch is performed.
   Imm signal is made of the lower 16 bits of IR\_reg (see IR\_reg in #8 below).
- **jump\_adrs** a 32 bit signal made of PC\_plus\_4\_pID[31:28] & IR[25:0] & b"00", i.e., the jump address in words multiplied by 4. This is the address to be loaded into the PC when a jump or a jal instruction is performed.
- **jr\_adrs** a 32 bit signal made of the Rs value in a JR instruction.

  Since we do not have a GPR file, we set the Rs value to x"00400004".

  In the complete CPU this will be the address to be loaded into the PC when inst. is jr.
- **PC\_source** a 2 bit signal. When "00", PC\_reg is loaded with PC\_plus\_4. When "01" it is loaded with branch\_adrs, when "10" with jr\_adrs, when "11" with the jump\_adrs.
- **IR\_reg-** a 32 bit register that has the instruction we read from the IMem. This register is part of the IMem (The IMem is an already designed component we use in the Fetch Unit).
- imm the 16 LSBs of IR\_reg
- sext\_imm sign extension of imm to 32 bits
- opcode the 6 MSBs of IR\_reg. We sould determine the PC\_source by the instruction opcode
   (j,jal-11, beq,bne-01,jr -10, any other instruction-00).
- **HOLD** This signal is meant to freeze all registers when it is "1". Very important !!

## The Simulation project



#### The files we require to have in order to run the simulation are:

#### Group #1 – The design files

- HW2\_top\_4sim.vhd The pre-prepared top file connecting the Fetc\_Unit, clock\_driver & BYOC\_host intf
- 2. Fetch\_Unit.vhd your design implementing figure 1

<u>Group #2</u> – The infrastructure files [pre-prepared] The same for HW2, HW4, HW5, HW6.

- 3. BYOC\_clock\_driver\_4sim.vhd component dividing the CK from 50MHz to 25MHz
- 4. BYOC\_Host\_Intf\_4sim.vhd component having the IMem (& creates reset & hold)

#### <u>Group #3 – The simulation files [different for each HW]</u>

- 3. SIM\_HW2\_TB.vhd The TB vhd file prepared in advance
- **4. SIM\_HW2\_TB\_data.dat** The data file read by the TB during simulation for compari
- **5. SIM\_HW2\_program.dat** The program file for simulation.
- **6. SIM\_HW2\_filenames.vhd** The actual path information of the two dat files.

We prepared an "empty" **Fetch\_Unit.empty** file in which we already did the following:

- Defined the I/O pins of the Fetch\_Unit\_4sim design
- Defined all necessary internal signal inside the Fetch\_Unit\_4sim design
- Connected other signals, e.g., signals to be outputted to the TB & rdbk signals

You have to rename it to Fetch\_Unit.vhd and write the equations describing the logic circuitry of Figure 1!

## The Simulation project



## **Simulation report**

- You should submit a zip file of your entire simulation & implementation project.
   Your zip file should have two directories Simulation & Implementation.
   In the Simulation one you'll have 3 sub-directories:
   Src\_4sim here you put all of the \*.vhd sources and the \*.dat file (used by the TB)
   Sim here you should have the Fetch\_unit\_4sim project created by the simulator
   Docs Here you put your simulation report. With your ID numbers (names-opt.)
- Run the simulation to 5500 nS.
- In the DOC file you need to attach with screen captures describing the simulation you made. All signals mentioned in section 1a above should be presented in the screen capture. Show at least the 1<sup>st</sup> 10 ck cycle following the end of the reset pulse and make the values of all signals readable.
- In that doc file you need to answer the questions appearing in the HW2 doc.

## The Implementation project



#### So the files we require to have in order to run the implementation are:

Group #1 – The design files

- 1. HW2\_top.vhd The top file after renaming and removal of all signals outputted to the TB
- 1. Fetch\_Unit.vhd your design implementing figure 1

Group #2 – The infrastructure files

- **3.** BYOC\_clock\_driver.vhd A CK divider from 50MHz to 25MHz that has a BUFG driver inside
- **3. BYOC\_Host\_Intf.ngc** The pre-compiled component including the IMem and creating the reset & hold signals and the infrastructure interfacing to the PC which is required to run the implemented design.

Group #3 – The implementation files

**5. BYOC.ucf** - The file listing which signals are connected to which FPGA pins in the Nexys2 board.

#### A reminder: We connected the rdbk signals as follows:

```
rdbk0
        =>
                 PC reg,
                 PC plus 4,
rdbk1
        =>
                 branch adrs,
rdbk2
        =>
rdbk3
                 ir adrs,
        =>
rdbk4
                 jump_adrs,
        =>
rdbk5
                 PC plus 4 pID,
        =>
rdbk6
        =>
                 IR reg.
rdbk7
                 PC source (bits 1:0),
       =>
                 RESET (bit 0),
rdbk8
       =>
                 output of PC mux (32 bit signal).,
rdbk9
        =>
rdbk10 - 15 =>
                  0x00000000
```

- We run ISE and get a bit file
- Load the design using the Adept SW
- Load the IMem via the BYOCIntf SW
- Then use it also to apply a single ck and check the rdbk signals

- Load the IMem via the BYOCIntf SW
- Then use it also to apply a single ck and check the rdbk signals



# Now it is your turn!

Thanks for listening!

# **Backup slides**



# The pipelined MIPS



Fig. 3 - The pipeline processor

# The pipelined MIPS



## The pipelined processor with support



Fig. 4 - The processor & course infrastructure

## The VGA screen

A word, 32 bits, is 32 BW pixels. Bit 0 is on the left.

